25 research outputs found
Bayesian Compressed Regression
As an alternative to variable selection or shrinkage in high dimensional
regression, we propose to randomly compress the predictors prior to analysis.
This dramatically reduces storage and computational bottlenecks, performing
well when the predictors can be projected to a low dimensional linear subspace
with minimal loss of information about the response. As opposed to existing
Bayesian dimensionality reduction approaches, the exact posterior distribution
conditional on the compressed data is available analytically, speeding up
computation by many orders of magnitude while also bypassing robustness issues
due to convergence and mixing problems with MCMC. Model averaging is used to
reduce sensitivity to the random projection matrix, while accommodating
uncertainty in the subspace dimension. Strong theoretical support is provided
for the approach by showing near parametric convergence rates for the
predictive density in the large p small n asymptotic paradigm. Practical
performance relative to competitors is illustrated in simulations and real data
applications.Comment: 29 pages, 4 figure
Bayesian Conditional Density Filtering
We propose a Conditional Density Filtering (C-DF) algorithm for efficient
online Bayesian inference. C-DF adapts MCMC sampling to the online setting,
sampling from approximations to conditional posterior distributions obtained by
propagating surrogate conditional sufficient statistics (a function of data and
parameter estimates) as new data arrive. These quantities eliminate the need to
store or process the entire dataset simultaneously and offer a number of
desirable features. Often, these include a reduction in memory requirements and
runtime and improved mixing, along with state-of-the-art parameter inference
and prediction. These improvements are demonstrated through several
illustrative examples including an application to high dimensional compressed
regression. Finally, we show that C-DF samples converge to the target posterior
distribution asymptotically as sampling proceeds and more data arrives.Comment: 41 pages, 7 figures, 12 table
Bayesian Mixed Effect Sparse Tensor Response Regression Model with Joint Estimation of Activation and Connectivity
Brain activation and connectivity analyses in task-based functional magnetic
resonance imaging (fMRI) experiments with multiple subjects are currently at
the forefront of data-driven neuroscience. In such experiments, interest often
lies in understanding activation of brain voxels due to external stimuli and
strong association or connectivity between the measurements on a set of
pre-specified group of brain voxels, also known as regions of interest (ROI).
This article proposes a joint Bayesian additive mixed modeling framework that
simultaneously assesses brain activation and connectivity patterns from
multiple subjects. In particular, fMRI measurements from each individual
obtained in the form of a multi-dimensional array/tensor at each time are
regressed on functions of the stimuli. We impose a low-rank PARAFAC
decomposition on the tensor regression coefficients corresponding to the
stimuli to achieve parsimony. Multiway stick breaking shrinkage priors are
employed to infer activation patterns and associated uncertainties in each
voxel. Further, the model introduces region specific random effects which are
jointly modeled with a Bayesian Gaussian graphical prior to account for the
connectivity among pairs of ROIs. Empirical investigations under various
simulation studies demonstrate the effectiveness of the method as a tool to
simultaneously assess brain activation and connectivity. The method is then
applied to a multi-subject fMRI dataset from a balloon-analog risk-taking
experiment in order to make inference about how the brain processes risk.Comment: 27 pages, 7 figure
Covariate-Dependent Clustering of Undirected Networks with Brain-Imaging Data
This article focuses on model-based clustering of subjects based on the shared relationships of subject-specific networks and covariates in scenarios when there are differences in the relationship between networks and covariates for different groups of
subjects. It is also of interest to identify the network nodes significantly associated with each covariate in each cluster of subjects. To address these methodological questions, we propose a novel nonparametric Bayesian mixture modeling framework with
an undirected network response and scalar predictors. The symmetric matrix coefficients corresponding to the scalar predictors of interest in each mixture component involve low-rankness and group sparsity within the low-rank structure. While the low-rank structure in the network coefficients adds parsimony and computational efficiency, the group sparsity within the low-rank structure enables drawing inference on network nodes and cells significantly associated with each scalar predictor. Being a principled
Bayesian mixture modeling framework, our approach allows model-based identification of the number of clusters, offers clustering uncertainty in terms of the co-clustering matrix and presents precise characterization of uncertainty in identifying network nodes
significantly related to a predictor in each cluster. Empirical results in various simulation scenarios illustrate substantial inferential gains of the proposed framework in comparison with competitors. Analysis of a real brain connectome dataset using the
proposed method provides interesting insights into the brain regions of interest (ROIs) significantly related to creative achievement in each cluster of subjects.NSF-DMS 2220840, NSF-DMS 221067
Bayesian Data Sketching for Varying Coefficient Regression Models
Varying coefficient models are popular tools in estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian data sketching for varying coefficient models to obviate computational challenges presented by large sample sizes. To address the challenges of analyzing large data, we compress functional response vector and predictor matrix by a random linear transformation to achieve dimension reduction and conduct inference on the compressed data. Our approach distinguishes itself from several existing methods for analyzing large functional data in that it requires neither the development of new models or algorithms nor any specialized computational hardware while delivering fully model-based Bayesian inference. Well-established methods and algorithms for varying coefficient regression models can be applied to the compressed data. We establish posterior contraction rates for estimating the varying coefficients and predicting the outcome at new locations under the randomly compressed data model. We use simulation experiments and conduct a spatially varying coefficient analysis of remote sensed vegetation data to empirically illustrate the inferential and computational efficiency of our approach